Finite-Memory Strategies in POMDPs with Long-Run Average Objectives
نویسندگان
چکیده
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that POMDPs long-run average objective, the maker has approximately optimal strategies finite memory. This implies notably approximating value is recursively enumerable, as well a weak continuity property of respect to transition function.
منابع مشابه
Markov Decision Processes with Multiple Long-Run Average Objectives
We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the different objectives are characterized by the Pareto curve. We show that every Pareto optimal point can be ε-approximated by a memoryless strate...
متن کاملMagnifying Lens Abstraction for Stochastic Games with Discounted and Long-run Average Objectives
Turn-based stochastic games and its important subclass Markov decision processes (MDPs) provide models for systems with both probabilistic and nondeterministic behaviors. We consider turnbased stochastic games with two classical quantitative objectives: discounted-sum and long-run average objectives. The game models and the quantitative objectives are widely used in probabilistic verification, ...
متن کاملStrategy Synthesis for Stochastic Games with Multiple Long-Run Objectives
We consider turn-based stochastic games whose winning conditions are conjunctions of satisfaction objectives for long-run average rewards, and address the problem of finding a strategy that almost surely maintains the averages above a given multi-dimensional threshold vector. We show that strategies constructed from Pareto set approximations of expected energy objectives are ε-optimal for the c...
متن کاملSensor Synthesis for POMDPs with Reachability Objectives
Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize"weakest"additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent tha...
متن کاملReinforcement learning for long-run average cost
A large class of sequential decision-making problems under uncertainty can be modeled as Markov and Semi-Markov Decision Problems, when their underlying probability structure has a Markov chain. They may be solved by using classical dynamic programming methods. However, dynamic programming methods suffer from the curse of dimensionality and break down rapidly in face of large state spaces. In a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2022
ISSN: ['0364-765X', '1526-5471']
DOI: https://doi.org/10.1287/moor.2020.1116